Welcome
to the “land of wizards.” This implementation of SSAS, as with older
versions of SSAS, is heavily wizard oriented. SSAS has a Cube Wizard, a
Dimension Wizard, a Partition Wizard, a Storage Design Wizard, a Usage
Analysis Wizard, a Usage-Based Optimization Wizard, an Aggregation
Wizard, a Calculated Cells Wizard, a Mining Model Wizard, and a few
other wizards. All of them are useful, and many of their capabilities
are also available through editors and designers. Using a wizard is
helpful for those who need to have a little structure in the definition
process and who want to rely on defaults for much of what they need.
The wizards are also plug-and-play oriented and have been made
available in all SQL Server and .NET development environments. In other
words, you can access these wizards from wherever you need to, when you
need to. All the wizard-based capabilities can also be coded in MDX,
DMX, and ASSL.
Figure 1
shows how SSAS fits into the overall scheme of the SQL Server 2008
environment. SSAS has become completely integrated into the SQL Server
platform. Utilizing many different mechanisms, such as SSIS and direct
data source access capabilities, a vast amount of data can be funneled
into the SSAS environment. Most of the cubes you build will likely be
read-only because they will be for BI. However, a write-enabled
capability (WriteBack) is available in SSAS for situations that meet
certain data updatability requirements.
As you can also see in Figure 1,
the basic components in SSAS are all focused on building and managing
data cubes. SSAS consists of the analysis server, processing services,
integration services, and a number of data providers. SSAS has both
server-based and client-/local-based SSAS capabilities. This
essentially provides a complete platform for OLAP.
You create cubes by
preprocessing aggregations (that is, precalculated summary data) that
reflect the desired levels within dimensions and support the type of
querying that will be done. These aggregations provide the mechanism
for rapid and uniform response times to queries. You create them before
the user uses the cube. All queries utilize either these aggregations,
the cube’s source data, a copy of this data in a client cube, data in
cache, or a combination of these sources. A single Analysis Server can
manage many cubes. You can have multiple SSAS instances on a single
machine.
By orienting around UDM,
SSAS allows for the definition of a cube that contains data measures
and dimensions. Each cube dimension can contain a hierarchy of levels
to specify the natural categorical breakdown that users need to drill
down into for more details.
The data values within
a cube are represented by measures (the facts). Each measure of data
might utilize different aggregation options, depending on the type of
data. Unit data might require the SUM (summarization) function, Date of Receipt data might require the MAX
function, and so on. Members of a dimension are the actual level
values, such as the particular product number, the particular month,
and the particular country. Microsoft has solved most of the
limitations within SSAS. SSAS addresses up to 2,147,483,647 of most
anything within its environment (for example, dimensions in a database,
attributes in a dimension, databases in an instance, levels in a
hierarchy, cubes in a database, measures in a cube). In reality, you
will probably not have more than a handful of dimensions. Remember that
dimensions are the paths to the interesting facts. Dimension members
should be textual and are used as criteria for queries and as row and
column headers in query results.
Every cube has a schema from
which the cube draws its source data. The central table in a schema is
the fact table that yields the cube’s data measures. The other tables
in the schema are the dimension tables that are the source of the cube
dimensions. A classic star-schema data warehouse design has this
central fact table along with multiple dimension tables. This is a
great starting point for OLAP cube creation, as you can see in Figure 2.
Here, we show you a high-tech company’s computer sales star-schema data
warehouse that can be used as the source of building up an OLAP cube
within SSAS.
SSAS allows you to
build dimensions and cubes from heterogeneous data sources. It can
access relational OLTP databases, multidimensional data databases, text
data, and any other source that has an OLE DB provider available. You
don’t have to move all your data first; you just connect to its source.
In SSAS, you can also design OLAP cubes from scratch. Then you can have
SSAS create the relational schema of tables in SQL Server that you want
to populate with the transactional data that will drive the OLAP cube.
Essentially, cubes can
be regular or local cubes. Regular cubes are based on real tables as
the data source, have aggregations, and occupy physical storage space
of some kind. If a data source that contributes to this cube changes,
the cube must be reprocessed. Figure 3 shows this cube representation and that it consists of something called partitions.
Local cubes are entirely
contained in portable SSAS files (that is, tables) and can be browsed
without a connection to an SSAS instance. This is really like being in
“disconnected” mode.
Write-enabled
dimensions within a cube enable updates (that is, writes) of data that
can be shared back (that is, written back) with the data sources.
Following is a quick summary of all the essential cube terms in SSAS:
Database— A database is a logical container of one or more cubes. Cubes are defined within Analysis Server databases.
Cube— A cube is a multidimensional representation of the business facts. Types of cubes are regular and local.
Data source— The data source is the origin of a cube’s data.
Measure group—
This group is a collection (or grouping) of one or more measures into
some type of logical unit for business purposes. A measure group does
not occupy any physical space. It is metadata only.
Measure— A measure is a data fact representation. A measure is typically a data value fact, such as price, unit, or quantity.
Cell—
A cell is the part of a data measure that is at the intersection of the
dimensions. The cell contains the data value. If an intersection (that
is, cell) has no value yet, it does not physically exist until it is
populated.
Dimension—
A cube’s dimension is defined by the aggregation levels of the data
that are needed to support the data requirements. A dimension can be
shared with other cubes, or it can be private to a cube. The structure
of a dimension is directly related to the dimension table columns,
member properties, or structure of OLAP data mining models. This
structure becomes the hierarchy and should be organized accordingly.
You can also have strict parent/child dimensions in which two columns
are identified as being parent and child and the dimension is organized
according to them. In a regular dimension, each column in the dimension
contributes a hierarchy level.
Level— A
level includes the nodes of the hierarchy or data mining model. Each
level contains the members. Millions of members are possible for each
level.
Partition—
One or more partitions comprise a cube. Using a partition is a way to
physically separate parts of a cube. This separation essentially lets
you deal with individual slices of a data cube separately, querying
only the relevant data sources. If you partition by dimension, you can
perform incremental updates to change that dimension independently of
the rest of the cube. Consequently, you have to reprocess only the
aggregations that are affected by those changes. This is an excellent
feature for scalability.
Hierarchy—
A hierarchy is a set of members in a dimension and their position
relative to each other. Hierarchies can either be balanced or
unbalanced. Being balanced simply means that all branches of the
hierarchy descend to the same level. An unbalanced hierarchy allows for
branches to descend to different levels. It is also possible to define
more than one hierarchy for a single dimension. A great example of this
is “fiscal calendar time” and “Gregorian calendar time” being defined
in one dimension—a Time dimension that contains both time.gregorian and time.fiscal.
As mentioned previously, SSAS
has many wizards. Which wizards you use depends on what you need to
create.